−1.5 −1.0 −0.5 0.0 0.5 1.0 1.5
x
−1.5
−1.0
−0.5
0.0
0.5
1.0
1.5
y
Cartesian coordinates
−0.2 0.0 0.2 0.4 0.6 0.8 1.0 1.2
r
−2.0
−1.5
−1.0
−0.5
0.0
0.5
1.0
1.5
2.0
µ
Polar coordinates
Visible layer
(input pixels)
1st hidden layer
(edges)
2nd hidden layer
(corners and
contours)
3rd hidden layer
(object parts)
CAR PERSON ANIMAL
Output
(object identity)
x
1
w
1
x
2
w
2
+
Element set
+
xw
Element set
Logistic
Regression
Logistic
Regression
σ(w
T
x) σ
2n
n
AI
Machine learning
Representation learning
Deep learning
Example:
Knowledge
bases
Example:
Logistic
regression
Example:
Shallow
autoencoders
Example:
MLPs
Input
Hand-
designed
program
Output
Input
Hand-
designed
features
Mapping
from
features
Output
Input
Features
Mapping
from
features
Output
Input
Simplest
features
Mapping
from
features
Output
Most
complex
features
Rule-based
systems
Classic
machine
learning
Representation
learning
Deep
learning
n x
1
, . . . , x
n
y
w
1
, . . . , w
n
f(x, w) =
x
1
w
1
+ ··· + x
n
w
n
f(x, w)
f(x)
f(x, w)
f([0, 1], w) = 1 f([1, 0], w) = 1 f([1, 1], w) = 0
f([0, 0], w) = 0
1900 1950 1985 2000 2015
Year
10
0
10
1
10
2
10
3
10
4
10
5
10
6
10
7
10
8
10
9
Dataset size (numb er examples, logarithmic scale)
Iris
MNIST
Public SVHN
ImageNet
CIFAR-10
ImageNet10k
ILSVRC 2014
Sports-1M
Rotated T vs C
T vs G vs F
Criminals
Canadian Hansard
WMT EnglishFrench
Increasing dataset size over time
1950 1985 2000 2015
Year
10
1
10
2
10
3
10
4
Connections per neuron (logarithmic scale)
1
2
3
4
5
6
7
8
9
10
Fruit fly
Mouse
Cat
Human
Number of connections per neuron over time
1950 1985 2000 2015 2056
Year
10
2
10
1
10
0
10
1
10
2
10
3
10
4
10
5
10
6
10
7
10
8
10
9
10
10
10
11
Number of neurons (logarithmic scale)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
Sponge
Roundworm
Leech
Ant
Bee
Frog
Octopus
Human
Increasing neural network size over time
2010 2011 2012 2013 2014
Year
0.05
0.10
0.15
0.20
0.25
0.30
ILSVRC classification error rate
Decreasing error rate over time